35 research outputs found
Quantifying & characterizing information diets of social media users
An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media. With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives. Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.Immer mehr Menschen verwenden soziale Medien, z.B. Twitter und Facebook, als Quelle für Nachrichten und Informationen aus ihrem Umfeld. Diese Entwicklung hat zu einem Paradigmenwechsel hinsichtlich der Art undWeise, wie Informationen und Nachrichten in unserer Gesellschaft ausgetauscht werden, geführt – weg von klassischen Massenmedien hin zu internetbasierten Sozialen Medien. Angesichts dieser veränderten (Informations-) Umwelt ist es von entscheidender Bedeutung, den Informationskonsum von Social Media-Nutzern zu untersuchen und zu prüfen, wie automatisierte Algorithmen (z.B. Such- und Empfehlungssysteme) die Informationen verändern, die Social Media- Nutzer aufnehmen. In der vorliegenden Arbeit wird diese Aufgabenstellung wie folgt angegangen: Zunächst wird das Konzept der “Information Diets” eingeführt, das eine Zusammensetzung aus produzierten und konsumierten Social Media-Inhalten darstellt. Als nächstes werden die Vielfalt und die Verzerrung (der sogenannte “Bias”) der “Information Diets” quantifiziert die Social Media-Nutzer über die drei hauptsächlichen Social Media- Kanäle konsumieren: (a) persönliche Empfehlungen und Auswahlen, die die Nutzer manuell pflegen und wodurch sie soziale Verbindungen (social links) erzeugen, (b) Empfehlungen, die dem Nutzer von der Social Media-Plattform bereitgestellt werden und (c) Suchsysteme der Plattform, die die Nutzer für ihren Informationsbedarf verwenden. Die “Information Diets” der Social Media-Nutzer werden hierbei anhand der drei Dimensionen Themen, geographische Lage und politische Ansichten gemessen. Diese Arbeit zielt zum einen darauf ab, Social Media-Nutzer auf die möglichen Verzerrungen in ihrer “Information Diet” aufmerksam zu machen. Des Weiteren soll diese Arbeit auch dazu anregen, neuartige Mechanismen und Algorithmen zu entwickeln, um solche Verzerrungen abzuschwächen
Do Neural Ranking Models Intensify Gender Bias?
Concerns regarding the footprint of societal biases in information retrieval
(IR) systems have been raised in several previous studies. In this work, we
examine various recent IR models from the perspective of the degree of gender
bias in their retrieval results. To this end, we first provide a bias
measurement framework which includes two metrics to quantify the degree of the
unbalanced presence of gender-related concepts in a given IR model's ranking
list. To examine IR models by means of the framework, we create a dataset of
non-gendered queries, selected by human annotators. Applying these queries to
the MS MARCO Passage retrieval collection, we then measure the gender bias of a
BM25 model and several recent neural ranking models. The results show that
while all models are strongly biased toward male, the neural models, and in
particular the ones based on contextualized embedding models, significantly
intensify gender bias. Our experiments also show an overall increase in the
gender bias of neural models when they exploit transfer learning, namely when
they use (already biased) pre-trained embeddings.Comment: In Proceedings of ACM SIGIR 202
Characterizing Information Diets of Social Media Users
With the widespread adoption of social media sites like Twitter and Facebook,
there has been a shift in the way information is produced and consumed.
Earlier, the only producers of information were traditional news organizations,
which broadcast the same carefully-edited information to all consumers over
mass media channels. Whereas, now, in online social media, any user can be a
producer of information, and every user selects which other users she connects
to, thereby choosing the information she consumes. Moreover, the personalized
recommendations that most social media sites provide also contribute towards
the information consumed by individual users. In this work, we define a concept
of information diet -- which is the topical distribution of a given set of
information items (e.g., tweets) -- to characterize the information produced
and consumed by various types of users in the popular Twitter social media. At
a high level, we find that (i) popular users mostly produce very specialized
diets focusing on only a few topics; in fact, news organizations (e.g.,
NYTimes) produce much more focused diets on social media as compared to their
mass media diets, (ii) most users' consumption diets are primarily focused
towards one or two topics of their interest, and (iii) the personalized
recommendations provided by Twitter help to mitigate some of the topical
imbalances in the users' consumption diets, by adding information on diverse
topics apart from the users' primary topics of interest.Comment: In Proceeding of International AAAI Conference on Web and Social
Media (ICWSM), Oxford, UK, May 201
Where the earth is flat and 9/11 is an inside job: A comparative algorithm audit of conspiratorial information in web search results
Web search engines are important online information intermediaries that are frequently used and highly trusted by the public despite multiple evidence of their outputs being subjected to inaccuracies and biases. One form of such inaccuracy, which so far received little scholarly attention, is the presence of conspiratorial information, namely pages promoting conspiracy theories. We address this gap by conducting a comparative algorithm audit to examine the distribution of conspiratorial information in search results across five search engines: Google, Bing, DuckDuckGo, Yahoo and Yandex. Using a virtual agent-based infrastructure, we systematically collect search outputs for six conspiracy theory-related queries (“flat earth”, “new world order”, “qanon”, “9/11”, “illuminati”, “george soros”) across three locations (two in the US and one in the UK) and two waves (March and May 2021). We find that all search engines except Google consistently displayed conspiracy-promoting results and returned links to conspiracy-dedicated websites, with variations across queries. Most conspiracy-promoting results came from social media and conspiracy-dedicated websites while conspiracy-debunking information was shared by scientific websites and legacy media. These observations are consistent across different locations and time periods highlighting the possibility that some engines systematically prioritize conspiracy-promoting content
Novelty in news search: a longitudinal study of the 2020 US elections
The 2020 US elections news coverage was extensive, with new pieces of
information generated rapidly. This evolving scenario presented an opportunity
to study the performance of search engines in a context in which they had to
quickly process information as it was published. We analyze novelty, a
measurement of new items that emerge in the top news search results, to compare
the coverage and visibility of different topics. We conduct a longitudinal
study of news results of five search engines collected in short-bursts (every
21 minutes) from two regions (Oregon, US and Frankfurt, Germany), starting on
election day and lasting until one day after the announcement of Biden as the
winner. We find more new items emerging for election related queries ("joe
biden", "donald trump" and "us elections") compared to topical (e.g.,
"coronavirus") or stable (e.g., "holocaust") queries. We demonstrate
differences across search engines and regions over time, and we highlight
imbalances between candidate queries. When it comes to news search, search
engines are responsible for such imbalances, either due to their algorithms or
the set of news sources they rely on. We argue that such imbalances affect the
visibility of political candidates in news searches during electoral periods
What do Twitter comments tell about news article bias? Assessing the impact of news article bias on its perception on Twitter
News stories circulating online, especially on social media platforms, are nowadays a primary source of information. Given the nature of social media, news no longer are just news, but they are embedded in the conversations of users interacting with them. This is particularly relevant for inaccurate information or even outright misinformation because user interaction has a crucial impact on whether information is uncritically disseminated or not. Biased coverage has been shown to affect personal decision-making. Still, it remains an open question whether users are aware of the biased reporting they encounter and how they react to it. The latter is particularly relevant given that user reactions help contextualize reporting for other users and can thus help mitigate but may also exacerbate the impact of biased media coverage.
This paper approaches the question from a measurement point of view, examining whether reactions to news articles on Twitter can serve as bias indicators, i.e., whether how users comment on a given article relates to its actual level of bias. We first give an overview of research on media bias before discussing key concepts related to how individuals engage with online content, focusing on the sentiment (or valance) of comments and on outright hate speech. We then present the first dataset connecting reliable human-made media bias classifications of news articles with the reactions these articles received on Twitter. We call our dataset BAT - Bias And Twitter. BAT covers 2,800 (bias-rated) news articles from 255 English-speaking news outlets. Additionally, BAT includes 175,807 comments and retweets referring to the articles.
Based on BAT, we conduct a multi-feature analysis to identify comment characteristics and analyze whether Twitter reactions correlate with an article’s bias. First, we fine-tune and apply two XLNet-based classifiers for hate speech detection and sentiment analysis. Second, we relate the results of the classifiers to the article bias annotations within a multi-level regression. The results show that Twitter reactions to an article indicate its bias, and vice-versa. With a regression coefficient of 0.703 (
), we specifically present evidence that Twitter reactions to biased articles are significantly more hateful. Our analysis shows that the news outlet’s individual stance reinforces the hate-bias relationship. In future work, we will extend the dataset and analysis, including additional concepts related to media bias
Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data
Understanding human activities and movements on the Web is not only important
for computational social scientists but can also offer valuable guidance for
the design of online systems for recommendations, caching, advertising, and
personalization. In this work, we demonstrate that people tend to follow
routines on the Web, and these repetitive patterns of web visits increase their
browsing behavior's achievable predictability. We present an
information-theoretic framework for measuring the uncertainty and theoretical
limits of predictability of human mobility on the Web. We systematically assess
the impact of different design decisions on the measurement. We apply the
framework to a web tracking dataset of German internet users. Our empirical
results highlight that individual's routines on the Web make their browsing
behavior predictable to 85% on average, though the value varies across
individuals. We observe that these differences in the users' predictabilities
can be explained to some extent by their demographic and behavioral attributes.Comment: 12 pages, 8 figures. To be published in the proceedings of the
International AAAI Conference on Web and Social Media (ICWSM) 202
Who are the users of ChatGPT? Implications for the digital divide from web tracking data
A major challenge of our time is reducing disparities in access to and
effective use of digital technologies, with recent discussions highlighting the
role of AI in exacerbating the digital divide. We examine user characteristics
that predict usage of the AI-powered conversational agent ChatGPT. We combine
web tracking and survey data of N=1068 German citizens to investigate
differences in activity (usage, visits and duration on chat.openai.com). We
examine socio-demographics commonly associated with the digital divide and
explore further socio-political attributes identified via stability selection
in Lasso regressions. We confirm lower age and more education to affect ChatGPT
usage, but not gender and income. We find full-time employment and more
children to be barriers to ChatGPT activity. Rural residence, writing and
social media activities, as well as more political knowledge, were positively
associated with ChatGPT activity. Our research informs efforts to address
digital disparities and promote digital literacy among underserved populations
Gender, Age, and Technology Education Influence the Adoption and Appropriation of LLMs
Large Language Models (LLMs) such as ChatGPT have become increasingly
integrated into critical activities of daily life, raising concerns about
equitable access and utilization across diverse demographics. This study
investigates the usage of LLMs among 1,500 representative US citizens.
Remarkably, 42% of participants reported utilizing an LLM. Our findings reveal
a gender gap in LLM technology adoption (more male users than female users)
with complex interaction patterns regarding age. Technology-related education
eliminates the gender gap in our sample. Moreover, expert users are more likely
than novices to list professional tasks as typical application scenarios,
suggesting discrepancies in effective usage at the workplace. These results
underscore the importance of providing education in artificial intelligence in
our technology-driven society to promote equitable access to and benefits from
LLMs. We urge for both international replication beyond the US and longitudinal
observation of adoption
Misinformation, Believability, and Vaccine Acceptance Over 40 Countries: Takeaways From the Initial Phase of The COVID-19 Infodemic
The COVID-19 pandemic has been damaging to the lives of people all around the
world. Accompanied by the pandemic is an infodemic, an abundant and
uncontrolled spreading of potentially harmful misinformation. The infodemic may
severely change the pandemic's course by interfering with public health
interventions such as wearing masks, social distancing, and vaccination. In
particular, the impact of the infodemic on vaccination is critical because it
holds the key to reverting to pre-pandemic normalcy. This paper presents
findings from a global survey on the extent of worldwide exposure to the
COVID-19 infodemic, assesses different populations' susceptibility to false
claims, and analyzes its association with vaccine acceptance. Based on
responses gathered from over 18,400 individuals from 40 countries, we find a
strong association between perceived believability of misinformation and
vaccination hesitancy. Additionally, our study shows that only half of the
online users exposed to rumors might have seen the fact-checked information.
Moreover, depending on the country, between 6% and 37% of individuals
considered these rumors believable. Our survey also shows that poorer regions
are more susceptible to encountering and believing COVID-19 misinformation. We
discuss implications of our findings on public campaigns that proactively
spread accurate information to countries that are more susceptible to the
infodemic. We also highlight fact-checking platforms' role in better
identifying and prioritizing claims that are perceived to be believable and
have wide exposure. Our findings give insights into better handling of risk
communication during the initial phase of a future pandemic